Robust Learning, Smoothing, and Parameter Tying on Syntactic Ambiguity Resolution

نویسندگان

  • Tung-Hui Chiang
  • Yi-Chung Lin
  • Keh-Yih Su
چکیده

Statistical approaches to natural language processing generally obtain the parameters by using the maximum likelihood estimation (MLE ) method. The MLE approaches, however, may fail to achieve good performance in difficult tasks, because the discrimination and robustness issues are not taken into consideration in the estimation processes. Motivated by that concern, a discriminationand robustness-oriented learning algorithm is proposed in this paper for minimizing the error rate. In evaluating the robust learning procedure on a corpus of 1,000 sentences, 64.3% of the sentences are assigned their correct syntactic structures, while only 53.1% accuracy rate is obtained with the MLE approach. In addition, parameters are usually estimated poorly when the training data is sparse. Smoothing the parameters is thus important in the estimation process. Accordingly, we use a hybrid approach combining the robust learning procedure with the smoothing method. The accuracy rate of 69.8% is attained by using this approach. Finally, a parameter tying scheme is proposed to tie those highly correlated but unreliably estimated parameters together so that the parameters can be better trained in the learning process. With this tying scheme, the number of parameters is reduced by a factor of 2,000 (from 8.7 x 108 to 4.2 x lOS), and the accuracy rate for parse tree selection is improved up to 70.3% when the robust learning procedure is applied on the tied parameters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Priming Effect on Relative Clause Attachment Ambiguity Resolution in L2

This study examined whether processing ambiguous sentences containing relative clauses (RCs) following a complex determiner phrase (DP) by Persian-speaking learners of L2 English with different proficiency and working memory capacities (WMCs) is affected by semantic priming. The semantic relationship studied was one between the subject/verb of the main clause and one of the DPs in the complex D...

متن کامل

Smoothing and tying for Korean flexible vocabulary isolated word recognition

For large vocabulary recognition system, as well as for flexible vocabulary applications using hidden Markov model(HMM), parameter smoothing and tying have been used to increase the reliability of models. This paper describes bottom-up and topdown clustering techniques for state level tying. This paper also describes a method of applying parameter smoothing to the clustered states and covarianc...

متن کامل

Syntactic Ambiguity Resolution Using A Discrimination and Robustness Oriented Adaptive Learning Algorithm

In this paper, a discrimination and robusmess oriented adaptive learning procedure is proposed to deal with the task of syntactic ambiguity resolution. Owing to the problem of insufficient training data and approximation error introduced by the language model, traditional statistical approaches, which resolve ambiguities by indirectly and implicitly using maximum likelihood method, fail to achi...

متن کامل

Evaluation of the Regularization Algorithm to Decorrelation of Covariance Matrix of Float Ambiguity in Fast Resolution of GPS Ambiguity Parameters

Precise positioning in Real Time Kinematic (RTK) applications depends on the accurate resolution of the phase ambiguities. In RTK positioning, ambiguity parameters are highly correlated, especially when the positioning rate is high. Consequently, application of de-correlation techniques for the accurate resolution of ambiguities is inevitable. Phase ambiguity as positioning observations by the ...

متن کامل

An Incremental Bayesian Model for Learning Syntactic Categories

We present an incremental Bayesian model for the unsupervised learning of syntactic categories from raw text. The model draws information from the distributional cues of words within an utterance, while explicitly bootstrapping its development on its own partiallylearned knowledge of syntactic categories. Testing our model on actual child-directed data, we demonstrate that it is robust to noise...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computational Linguistics

دوره 21  شماره 

صفحات  -

تاریخ انتشار 1995